Chapter 4

Counting on Statistical Software

IN THIS CHAPTER

Examining the evolution of statistical software

Surveying commercial, open source, and free options

Considering code-based versus non–code-based software

Storing data in the cloud

Before statistical software, complex regressions we could do in theory were too complicated to do

manually using real datasets. It wasn’t until the 1960s with the development of the SAS suite of

statistical software that analysts were able to do these calculations. As technology advanced, different

types of software were developed, including open-source software and web-based software.

As you may imagine, all these choices led to competition and confusion among analysts, students, and

organizations utilizing this software. Organizations wonder what statistical packages to implement.

Professors wonder which ones to teach, and students wonder which ones to learn. The purpose of this

chapter is to help you make informed choices about statistical software. We describe and provide

guidance regarding the practical choices you have today among the statistical software available. We

discuss choosing between:

Commercial software, such as SAS and SPSS

Open-source software, such as R and Python

Free software applications, such as G*Power and PS (Power and Sample Size Calculation)

We also provide guidance on how to choose between code-based and non–code-based software, and

end by providing advice on cloud data storage.

Considering the Evolution of Statistical Software

The first widespread commercial statistical software invented is called SAS, and it is still used today.

SAS was developed originally in the 1960s and 1970s to run on mainframe computers. Around 2000,

SAS was adapted to personal computers (known as PC SAS), adding a user-friendly graphical user

interface (GUI). During the growth of SAS, other commercial statistical packages appeared, the most

popular being IBM’s SPSS. SAS continues to be the go-to program for big data analysis, where

analysts can easily access large datasets from servers. In contrast, SPSS continues to be used on a

personal computer like PC SAS.

If you were to take a college statistics course in the year 2000, your course would have likely taught

either SAS or SPSS. Professors would have made either SPSS or SAS available to you for free or for

a nominal license fee from your college bookstore. If you take a college statistics course today, you